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With the widespread availability of satellite-based instruments, 
C"| ■ many geophysical processes are measured on a global scale and they 

' often show strong nonstationarity in the covariance structure. In this 

paper we present a flexible class of parametric covariance models that 
\& • can capture the nonstationarity in global data, especially strong dc- 

s ! ' pendency of covariance structure on latitudes. We apply the Discrete 

Fourier Transform to data on regular grids, which enables us to cal- 
^ ' ' culate the exact likelihood for large data sets. Our covariance model 

is applied to global total column ozone level data on a given day. 
We discuss how our covariance model compares with some existing 
models. 
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1. Introduction. In geophysical and environmental applications, it is 
common to have spatial data covering a large portion of the Earth. Such 
■ data often show nonstationary covariance structure on a global scale. In 

particular, covariance structures can strongly vary with latitude, although 
the process may be roughly stationary with respect to longitude. 
£C) ■ One example of such data is the Level 3 Total Ozone Mapping Spectrom- 

eter (TOMS) data, which gives the daily total column ozone levels. The data 
' are given on a spatially regular grid (1° latitude by 1.25° longitude away 

from the poles) and there are not many missing observations. A more de- 
tailed description of the data is given in Section 4.1. Figure 1(a) shows the 
standard deviations of the TOMS measurements on May 14, 1990 evaluated 
at each longitude and (b) shows the similar quantity at each latitude. From 
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Fig. 1. Comparison of the empirical (dots) and fitted (colored lines) standard deviations 
for models A, B, C, F and H. The details on how each quantity are calculated are given 
in Appendix A. Note that (a) does not have fitted values since the quantity plotted is with 
respect to longitude and is constant under axial symmetry. 

these figures, it is clear that there is a strong dependence of the covariance 
structure on latitude but not so much dependence on longitude. Table 3 
gives the directional variogram values at small scales for a few fixed latitude 
levels with the same data and the empirical values show strong dependence 
on latitude. 

A number of papers have developed nonstationary spatial covariance func- 
tions. Spatial deformations to model nonstationary spatial processes have 
been used, for example, in Sampson and Guttorp (1992), Perrin and Senoussi 
(2000), Schmidt and O'Hagan (2003), Clerc and Mallat (2003) and 
Anderes and Stein (2008). Kernel convolution and its variants have been 
applied in several papers to create nonstationary covariance functions such 
as in Higdon (1998), Higdon, Swall and Kern (1999), Fuentes (2002) and 
Paciorek and Schervish (2006). Nychka, Wikle and Royle (2002) use a wavelet 
approach to produce nonstationary covariance functions. However, none of 
these works are suitable for global data since their spatial domain is M. d , not 
a sphere. Paciorek and Schervish (2006) discuss the possibility of extending 
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their models so that they can be suitable for a spherical domain, but nothing 
has been explicitly developed or implemented. 

There have been several attempts in developing covariance models for 
global processes. Yaglom (1987) and Gneiting (1999) note that restricting 
valid isotropic covariance functions in M 3 gives valid covariance functions on 
a sphere. This idea is simple yet powerful and is able to produce a range of 
valid covariance functions on a sphere, but only isotropic ones. Another way 
to generate isotropic covariance functions on spheres is through expansions 
in spherical harmonics. However, as Jun and Stein (2007) point out, one 
of the main difficulties in developing covariance functions this way is the 
mathematical challenge of finding explicit expressions for the covariance 
function. There are few known closed forms from the spherical harmonics 
expansion and almost all of these give analytic covariance structures [e.g., 
Das (2000)], which are too smooth for practical application. 

Cressie and Johannesson (2008), Jun and Stein (2007) and Stein (2007, 
2008) all consider nonstationary statistical models for total column ozone on 
a global scale. Cressie and Johannesson (2008) model nonstationary covari- 
ance structure on a sphere through a covariance matrix of fixed rank (plus a 
diagonal matrix), which enables fast computation of the covariance matrices. 
They allow nonstationarity across latitudes and longitudes. If we only want 
nonstationarity across latitudes, then we can use what Jones (1963) calls an 
axially symmetric process whose first two moments are invariant to rotations 
about the Earth's axis. All of the approaches in Jun and Stein (2007) and 
Stein (2007, 2008) assume that the process is axially symmetric. Similar to 
Cressie and Johannesson (2008), Stein (2007) uses covariance structures of 
fixed rank to speed computations, but despite having more than 170 covari- 
ance parameters, the resulting model does not do a good job of describing 
the local behavior of total column ozone. Stein (2008) adds a compactly sup- 
ported covariance function to this model, which helps to better capture the 
local fluctuations while maintaining some of the computational advantages 
of the fixed rank covariance matrices, but still has some problems appropri- 
ately describing the nonstationarity across latitudes in the data. The model 
used in Cressie and Johannesson (2008) has 396 parameters, but it is not 
clear to us that even this model can accurately capture the local behavior 
of the process. 

The approach described in Jun and Stein (2007) produces a class of rich 
and flexible nonstationary covariance models for either spatial or spatial- 
temporal processes with explicit expressions and without requiring nearly as 
many parameters as approaches based on series expansions. The key idea is 
to apply differential operators with respect to latitude, longitude and time 
to a homogeneous process on sphere x time domain. They give nonrandom 
functions depending on latitude as coefficients for each differential operator. 
However, Jun and Stein (2007) do not provide much guidance on how to 
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model the coefficients of the differential operators and their relationship to 
the resulting covariance structure. The present work seeks to fill this gap 
by providing some specific approaches for modeling these coefficients of the 
differential operators. 

Whenever we deal with large spatial datasets, we face problems in mem- 
ory and computation. The covariance functions used here produce covari- 
ance matrices that are neither low rank nor sparse, so exact likelihood cal- 
culations are not possible with large numbers of irregularly sited observa- 
tions. One possible solution is to approximate the likelihoods to overcome 
the computational obstacles [Vecchia (1988), Stein, Chi and Welty (2004), 
Caragea and Smith (2006), Fuentes (2007)]. For spatially gridded data, how- 
ever, it is sometimes possible to compute the exact likelihood even for quite 
large datasets. 

In this paper we propose a simple yet flexible way of modeling nonsta- 
tionary covariance functions on a sphere. We use linear combinations of 
Legendre polynomials to represent the coefficients of partial differential op- 
erators in the approach of Jun and Stein (2007) and find that only a few 
terms of these polynomials are enough to capture most of the nonstationar- 
ity and the local and global variations in the process. Furthermore, by using 
gridded data and exploiting the structure of the resulting covariance matrix, 
we achieve great savings in memory and in computation of the likelihood. 
Our method is applied to global total column ozone levels and the results 
show that our covariance models with only a modest number of parameters 
can capture the strong nonstationarity with respect to latitudes as well as 
the local variation of the process well. 

2. Nonstationary covariance models through differential operators. For 

a process Z(L,l) on a sphere (L and I denote latitude and longitude, resp.) 
such that for a suitable function K, Cov{Z(Li,l\), Z(L2,h)} = K(L\, L2, l\ — 
I2), that is, Z is stationary with respect to longitude, we say the process is 
axially symmetric [Jones (1963)]. Jun and Stein (2007) propose a class of ax- 
ially symmetric covariance models for space-time processes that has flexible 
space-time interactions. For a spatial process, this approach gives a natural 
way of producing nonstationarity on a purely spatial domain, for instance, 
different covariance structure along different latitude levels, while retaining 
the simplification of axial symmetry. 

We call a spatial process on a globe homogeneous if its covariance function 
only depends on the great circle distance (or, equivalently, chordal distance) 
between two locations. Note that the chordal distance between two locations 
on a globe, (Li, li), i = 1,2, is given by 




+ cos L\ cos L2 sin 
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Here, R is the radius of the Earth. The great circle distance between the two 
locations is gc(Li, L2J1 — h) = 2i?arcsin{ch(Li, L2, h — l2)/(2R)}. Let us 
assume Zq is homogenous and has mean zero. Then, following Jun and Stein 
(2007), define a mean zero process Z by 



where A and B denote nonrandom functions depending on latitude. As 
explained in Jun and Stein (2007), by making A and B independent of lon- 
gitude, the resulting process Z is axially symmetric, while it is naturally 
nonstationary with respect to latitude. Note though that we can also let the 
functions A and B depend on longitude if we do not want axial symmetry. 

Although (1) gives a natural and flexible way of modeling physical pro- 
cesses on a sphere with nonstationary covariance structure, we need to 
choose the forms for the functions A and B in any specific application. 
Jun and Stein (2007) modeled ozone levels over a limited latitude band and 
assumed that A{L) and B{L)/ cosL are constants, which made Var{Z(L, !)} 
independent of latitude. However, as Figure 1(b) shows, this assumption is 
inappropriate for total column ozone levels on a global scale. 

The covariance models with A(L) and B(L)/ cosL being constants yield 
constant variance, but it is easy to show that the resulting covariance func- 
tion is not homogeneous. The covariance models proposed in Jun and Stein 
(2007), however, do include homogeneous models in some sense by adding a 
homogeneous process to the field Z in equation (1). That is, by letting 



with C a constant, Z reduces to a homogeneous model when A(L) and B(L) 
are equal to zero. 

Note that correlation values from any homogeneous covariance models 
in M 3 cannot go below inf s > ^ » -0.218 [Stein (1999)] and Matern co- 
variance models cannot give negative correlations. However, the differential 
operators in (1) enable the covariance models to produce negative correla- 
tion values down to —1. Indeed, it is easy to show that, for any mean square 
differentiable Matern model, -^Zq(L,1) has, as the range parameter tends to 
infinity, correlation tending to —1 for two points at longitudes 180° apart. 

2.1. Models for A and B functions. The approach in (1) may be flexi- 
ble, but the lack of methodology for selecting and estimating the A and B 
functions is an impediment to applying the models in real applications to 
global data. Our idea for modeling A(L) and B(L) is to use linear combi- 
nations of Legendre polynomials, which is a class of orthogonal polynomials 
suited for functions on spheres. Let us consider a linear combination with 



(1) 



Z(L,l) = l [ A(L)-^ + B(L)^Z (L,l) 




A(L)— + B(L)- \Zo(L,l) + CZ (L,l) 
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Legendre polynomials up to order m, P(L;po, . . . ,p m ) =J2iLoPiPi( s ^ n ^)^ 
where Pi denotes the Legendre polynomial of order i. Then, we let A(L) = 
P(L; ao, . . . , a p ) and B(L) = P(L; bo, . . . , b q ) for certain integers p and q and 
constants ao, • • ■ , a p , bo, . . . , b q . We set ao = 1 to avoid identifiability prob- 
lems. Specifically, if ao and 60 are allowed to vary freely, we cannot identify 
all the parameters, ao, 60 an d the sill parameter for Zq. Our goal is, with 
only a modest number of parameters (if p = q = 3, we need 7 parameters 
for A and B) compared to Stein (2007, 2008), to capture nonstationarity at 
both large and small scales. 

We may have to extend the model (1) to make it more flexible. Since the 
covariance function of Z in (1) involves not only the squared terms of the 
functions A and B but their cross product terms, once the values of A and 
B are fixed, A ■ B is determined completely. Jun and Stein (2007) note this 
restriction and propose the following extension of (1): for Z\, . . . , Zjy, i.i.d. 
copies of Z Q , let Z(L, I) = J2k=i{ A k(L) + -B fc (L)^}Z fe (L, I). For example, 
we may have 

(2) Z(L,l) = ^A 1 (L)^ L +B 1 (L)^Z 1 (L,l) + A 2 (L)^ L Z 2 (L,l). 

This model gives an "interaction" between latitude and longitude through 
A\ ■ B\ that produces a term depending on both magnitude and the sign 
of longitude lag. Stein (2007) calls a process longitudinally reversible if the 
covariance of the process depends on the longitude lag only through its mag- 
nitude. He showed that Level 2 TOMS data is not longitudinally reversible. 
From the model in (2), the interaction term A\ ■ B\ produces covariance 
models that are not longitudinally reversible. The model in (2) allows fur- 
ther nonstationarity with respect to latitude through A 2 . 

As in Jun and Stein (2007), the process has singularities at the poles 
unless \\m. L ^ / 2 {A{L) 2 + B(L) 2 cos 2 L} = 0. The Legendre-polynomial-basis 
in general does not solve this problem. If we let A{L) = ^™1q ajPj(sinL) and 
B(L) = J27=o biPi(sinL), it is straightforward to verify that the condition for 
the process being mean square continuous at the poles is X^=o a * = 0- 

3. Fast computation using the discrete Fourier transform. In this sec- 
tion we show how to compute the exact full likelihood of data on a grid of 
288 longitudes and 100 latitudes. The key idea is the "diagonalization" of 
the covariance matrix through the Discrete Fourier Transform (DFT) for 
a block circulant matrix. Note that a block circulant matrix is completely 
determined by its first block column. 

Suppose a process is observed over the full longitude range from —180° 
to 180° and full or partial latitude range. Suppose further that the observed 
sites are on regular grids, {(Li,lj) :% = !,... ,m,j = 1, . . . ,n}, |Lj — = 
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\L m — Li\/(m — 1), i = 2, . . . ,m, and |/j — lj-i\ = 360/n for all j = 1, . .. ,n. 
Let us denote the observed processes 

Z = {Z(Li, li), . . . ,Z(L m ,li); . . . ;Z(Li,l n ), . . . ,Z(L m , l n )} 

and 

Z = {Z(L\, li), . . . ,Z(Li, l n ); . . . ;Z(L m , ,Z(L m , l n )}. 

Notice that Z is the same as Z with the elements rearranged. Now, if the 
process Z is axially symmetric, it is easy to see that the covariance matrix of 
Z is block circulant. Or, the covariance matrix of Z is an n x n block matrix 
with mxm circulant blocks. Theorems 5.6.4 and 5.7.2 in Davis (1979) state 
that, through the DFT, the covariance matrix of Z can be diagonalized 
and become a block diagonal matrix; the same diagonalization holds for the 
covariance matrix of Z after rearranging the rows and columns. In particular, 
if we let Zil = {Z(L, . . . , Z(L, l n )} for any latitude L and write F for the 
DFT operator, the covariance matrix of {F7jl 1 , . . . , FZiL m } consists of n x n 
diagonal matrices as its blocks. Then it is easy to see that the covariance 
matrix of FZ should be a block diagonal matrix whose block size is m x m, 
since it is just the covariance matrix of {FZl 1 , ■ ■ ■ ,FZiL m } with rows and 
columns rearranged. 

Now, instead of directly calculating the likelihood of Z, we calculate the 
likelihood of FZ whose covariance matrix is a block diagonal matrix. Hence, 
instead of computing the Cholesky decomposition of a mn x mn matrix, we 
only need to decompose n matrices of size mxm matrix. As long as the data 
are on regular grids with full longitude range and no missing data, the com- 
putation of the exact likelihood is very fast using the fast Fourier transform. 
Note that in calculating loglikelihood, we need m DFTs and n Cholesky 
decompositions of matrices of size mxm. Thus, the total computational 
complexity for DFT is O(mnlogn) and that for the Cholesky decomposi- 
tion is 0(m 3 n). In terms of memory, since circulant matrices are determined 
by their first column, we do not have to save the entire covariance matrix. 
This reduces the size of required memory substantially. 

4. Application: TOMS Level 3 data. We now show the application of 
the covariance models to total column ozone level data on a global scale. 

4.1. Data. During the period of November 1, 1978 to May 6, 1993, 
Nimbus 7 carried a TOMS instrument and the data from this satellite are 
either in Level 2 or Level 3 versions. TOMS Level 2 data have been analyzed 
in some recent papers in the statistical literature, including 
Cressie and Johannesson (2008) and Stein (2007, 2008). Level 2 data give 
spatially and temporally irregular measurements following the satellite scan- 
ning tracks (measurements are 8 seconds apart) and there is a significant 
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amount of missing observations. TOMS Level 3 data are post processed from 
Level 2 data and they are on regular grids [1 degree latitude by 1.25 degrees 
longitude for pixels with latitude from 50° S to 50° N, see Krueger et al. 
(1998) for more details] as daily averages. TOMS Level 3 data are ob- 
tained from an ad hoc method to average Level 2 data pixel by pixel. Both 
Levels 2 and 3 data, along with more information on the data, can be 
obtained from http://toms.gsfc.nasa.gov/ozone/ozone_v8.html. 
Cressie and Johannesson (2008) produce new Level 3 data through statisti- 
cal models rather than ad hoc averaging. Although there is loss of informa- 
tion in Level 3 data, especially fine scale spatial and temporal variations, 
data on grids with global coverage and few missing observations are conve- 
nient to focus on the study of the covariance structure of the process purely 



monthly average 




Fig. 2. Monthly average and daily data minus monthly average of TOMS ozone Level 
3 data. Top figure is for monthly average of May 1990, middle for May 14 1990 and the 
bottom for May 15 1990. 
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Fig. 3. Estimated mean structure of the data for May 14 and May 15, 1990, using spher- 
ical harmonics ({Y™{smL, l)\n — 0,1,2, ... ,m = —ri, ... ,n} for n = 12, see Section 4-2), 
and the residuals (data minus estimated mean). 



due to the computational efficiencies described in Section 3. Our covari- 
ance models are not restricted to data on regular grids, but exact likelihood 
computations would then require much more memory and computation. 

We use the original NASA-produced TOMS Level 3 data (Level 2 data 
aggregated through pixel by pixel averaging) for May 14-15, 1990. There 
are 288 longitude points (evenly spaced over —180° to 180°) and 100 lati- 
tude points (evenly spaced over 50° S~50° N) . The data from latitude levels 
beyond this range have fewer than 288 observations per latitude. The size of 
the data is thus 28,800 per day. There are still a few pixels with missing data 
(e.g., out of 28,800 pixels 62 pixels for May 14 and 71 for May 15) and we 
naively impute these locations. That is, the missing pixels are filled with the 
average of available data from the 8 neighboring cells. Considering the small 
fraction of missing observations, the method of imputation should not af- 
fect the fitted results significantly. In fact, we tried various other imputation 
methods and the results do not change much. 

4.2. Modeling the mean structure. Since we focus on modeling the co- 
variance structure of this data, we should somehow filter the data and make 
the process close to mean zero. We first tried to subtract the monthly aver- 
age of May from the data for each day in May. After this filtering, we would 
expect the residual should be closer to mean zero and Gaussian. However, as 
Figure 2 shows, the residuals have noticeable patterns. For example, for both 
May 14 and 15, there are hot spots in the high and low latitude area. While 
these phenomena could be explained by Rossby waves [Holton (1992)], our 
covariance models developed here do not have the ability to capture such 
patterns and so we seek to model these patterns with the mean function. 

Spherical harmonics provide a natural basis for capturing the large-scale 
patterns displayed in Figure 2. This is similar to what has been done in Stein 
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(2007, 2008). Specifically, we regress the ozone levels with {Y™(sinL,l)\n = 
0, 1, 2, ... ,m = — n, . . . , n} for n = 12. Figure 3 shows the estimated mean on 
May 14-15, 1990 and the residuals. Note that the spherical harmonics terms 
capture most of the patterns in the mean and the residuals do not have the 
noticeable patterns that we see in Figure 2. We tried a few different values 
of n around 12 and the filtered results are not sensitive to the choice of n. 

4.3. Tapering of the data. An issue with the TOMS Level 3 data is the 
discontinuity at the international date line (IDL) [Fang and Stein (1998)]. 
TOMS Level 3 data are post processed data from the original satellite scans 
and they give daily averages for the full range of longitudes. Therefore, 
there are substantial discrepancies in the data at the IDL, in particular, at 
longitudes ±179.375°. However, since our model is purely spatial and axially 
symmetric, it cannot model a noticeable discrepancy at the IDL. 

A natural way of dealing with this issue is data tapering. To explore the 
effect of tapering, at each latitude level, we tapered the data by a split cosine- 
bell taper [Tukey (1967)] with 5% of each end of the data tapered. Note 
though that our data are actually on a circle. Therefore, the taper set the 
ozone levels at longitudes ±180° to be zero and the data at longitude ranges 
(approximately) —180° to —163° and 163° to 180° are tapered accordingly. 
We do not taper the original data but the residual after subtracting the 
mean estimated in Section 4.2. 

Note that in this case the process is not axially symmetric anymore 
and, thus, the covariance matrix is no longer block circulant, but only 
block Toeplitz. Thus, the covariance matrix of FZ is block diagonal only 
asymptotically. We tried fitting the model to the tapered data in two dif- 
ferent scenarios. First, we simply ignored the fact that the process is no 
longer axially symmetric and performed the computation as in Section 3. 
A second approach is that we approximated the covariance matrix of FT} 
(Z* = {Z*(Li, h), . . . , Z\L m , h); . . . ; Z'(Li,Z n ), . . . , Z\L m ,l n )}) by a block 
diagonal matrix, but we modified each block in the diagonal so that the 
block diagonals give the covariances that take account of the tapering. We 
applied both approaches for more than 20 days of May, 1990 (each day fit 
separately) and found that the parameter estimates for the two approaches 
are quite similar. Therefore, for computational simplicity, we decided to ap- 
ply the first approach ignoring the effect of tapering in fitting the covariance 
functions. 

4.4. Covariance models. We compare the nonstationary models devel- 
oped in Section 2.1 along with a few other nonstationary covariance models. 
The covariance models that we compare here are axially symmetric and in 
nested format. 
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Table 1 

Covariance models and the increase of maximized loglikelihood values ( denoted by 
A loglikelihood) compared to the maximized loglikelihood value for A, which is —128568.9 



Model ABCDEFGHI J 



m 0360033366 

ni 3636666 

n 2 3630666 

n 3 6 



# parameters 4 7 10 14 20 17 16 23 26 30 
A loglikelihood 1764.1 1900.1 1867.6 2052.3 2671.8 2714.6 2803.4 2871.6 2914.3 



Suppose Zq in (1) has mean zero and a homogeneous covariance function 
Kq. In particular, let Kq be a Matern covariance function when viewed as a 
covariance function on R 3 : 

(3) Cov{Z (£i, h), Z (L 2 ,l 2 )} = K (d; a, (3, v) = a(d/ (3) v X v (d/ (3), 

where /C„ is the modified Bessel function of the third kind of order v [Stein 
(1999)] and d = ch{L\,L 2 ,li — l 2 ). Note that (3 is the spatial range param- 
eter. Our simplest model for Z is a rescaled version of Zq with a nugget 
effect: Z(L,l) = P(L; ko, . . . , k m ) Zq(L,1) plus a nugget effect, yielding the 
covariance function 

K 1 (L 1 ,L 2 ,l;a,P,u,e,ko,.. .,k m ) 

(4) = P(Li;ko, k m )P(L 2 ; ko,..., k m )K (d; a, (3, v) 

+ £ ■ 1 (L 1 -L 2 =l=0), 

for d = ch(Li, L 2 , 1). By multiplying Zq by the function P(L; ko, . . . , k m ), 
K\ can capture the strong dependence of variance on latitude through a 
fairly simple model. To avoid identifiability problems, we set ko = 1- When 
m = 0, the variance of the process is constant with respect to latitude and 
the process is isotropic. 

Next, we add the component developed in Section 2.1 to K\. That is, 

K 2 (Li,L 2 ,l;a,/3,u,e,k , . . ., k m ,ai,/3x,vi,ao, . . . ,a ni ,b , . . .,b n2 ) 

(5) = K z {L 1 ,L 2 ,l;ai,Pi,vi,ao,.. .,a ni ,b ,.. -,b n2 ) 

+ Ki (Li,L 2 , 1; a, (3, v, e,k ,..., k m ). 

Here, Kz in (5) is the covariance function of (1) with a±, (3\ and v\ being the 
parameters for (3), A(L) = P(L;ao, . . . ,a ni ) and B(L) = P(L; bo, . . . , b n2 ). 
To avoid identifiability problems, we set ko = ao = 1- We also have an ex- 
tended version of K 2 , denoted by K' z , obtained by using Z as in (2) for the 
process Z in (5). That is, K' Z (L 1 ,L 2 , 1; ai,p 1} v x , a , ... , a ni ,b , ■ ■ ■ , b n2 , c , . . . , 
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c n 3 ) =Kz(Li,L 2 ,l;ai,Pi,v 1 ,a , . . . ,a ni ,b , . . .,b n2 )+K z (L 1 ,L 2 ,l;a 1 ,Pi,ui, 
Co, ... , c„ 3 ) with A 2 (L) = P(L; cq, . . . , c n3 ). A summary of the covariance 
models with their numbers of parameters is given in Table 1. 

4.5. Fitted results. The full likelihoods are calculated using the discrete 
Fourier transform as described in Section 3; we use the R functions nlm 
and optim to maximize the likelihoods. We tried several initial points for 
each optimization and found that all of the optimizations yielded the same 
maximum points. Table 1 gives the maximized loglikelihood values for each 
covariance model. First notice that the rescaled isotropic model with 3 ad- 
ditional parameters (model B) compared to the simplest isotropic model 
(model A) increases the loglikelihood by over 1700. This large increase is 
expected since there is clear latitude dependency of variance as shown in 
Figure 1(b). By adding the Kz function in (5) to the rescaled isotropic 
model, we get an additional substantial increase of the loglikelihood. That 
is, the loglikelihood of model F increases by about 907 and that of model 
G increases about 950 from the loglikelihood of model B. Models D and E, 
which have a K z component but not the rescaled version of the isotropic 
part in K\, do not fit the data as well as models F or G, despite model E hav- 
ing more parameters than models F and G. Model J, which has covariance 
function K' z as described below (5), does not improve the fit dramatically 
over model G. 

The estimated covariance parameters (MLE) for some of the models in 
Table 1 along with their asymptotic standard errors are presented in Table 2. 
The unit of spatial distance is km. The asymptotic standard errors are given 
by the square roots of the diagonal elements of the inverse Hessian matrix, 
evaluated at the MLE values. For model H, even though the optimization by 
the R functions nlm and optim claimed that it reached the maximum point, 
the Hessian matrix was almost singular. This is due to two parameters, a\ 
and Pi, for which the data provide relatively little information compared 
to the other parameters. Figure 4 shows the shape of profile loglikelihood 
for model H. The profile loglikelihood is indeed flat for parameters (3\ and 
u\ compared to parameters and v, respectively. The two parameters, a 
and a.\, are not comparable in magnitude and, thus, direct comparison of 
the shape of the loglikelihood from this figure is not appropriate. Because 
of this problem with model H, we fixed the values for the two parameters, 
ax and Pi, from the first optimization and ran the optimization again for 
the other parameters. The asymptotic standard errors given for model H 
are from the second optimization. Notice first that the MLE values of the 
sill, spatial range and smoothness parameters for the three models do not 
vary much. The estimated nugget is somewhat bigger for model B compared 
to the other two models. One of the big differences between model B and 
models F and H is that models F and H have the differential operators terms 
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Table 2 

Maximum likelihood estimates of covariance parameters for May 14, 1990 along with 
their asymptotic standard errors. For model H, standard errors are based on treating ot\ 
and f3i as known (see Section 4-5 for details) 





B 


F 


H 


a 


64.89 (2.64) 


73.59 (3.81) 


73.98 (3.99) 


n 

P 


218.65 (8.88) 


260.20 (11.54) 


262.88 (11.82) 


V 


1.20 (0.038) 


1.23 (0.040) 


1.26 (0.040) 


E 


1.76 (0.089) 


0.41 (0.11) 


0.49 (0.11) 


fcl 


0.48 (0.023) 


0.46 (0.034) 


0.51 (0.035) 


ki 


0.81 (0.023) 


1.061 (0.024) 


1.08 (0.024) 


fa 


0.071 (0.030) 


0.15 (0.041) 


0.22 (0.044) 






6.13e-05 (6.03e-06) 


4.81e-06 (— ) 


01 




53.14 (1.32) 


58.65 (— ) 


V\ 




2.5 (2.11e-04) 


2.25 (1.51) 


"1 




0.34 (0.069) 


2.36 (0.84) 


a-. 




-0.034 (0.061) 


-15.65 (0.89) 


a 3 




-0.045 (0.078) 


0.62 (1.22) 








-17.13 (0.88) 


as 






-0.11 (0.69) 


a,, 






-10.42 (0.53) 


bo 




0.15 (0.036) 


3.07 (1.44) 


61 




0.57 (0.37) 


-7.18 (2.64) 


62 




0.89 (0.034) 


13.91 (0.52) 


63 




0.27 (0.033) 


-13.14 (4.33) 


64 






10.90 (0.39) 


h 






-7.55 (2.39) 


b 6 






6.66 (0.62) 



while model B does not. Model B simply has spatially varying variance 
terms but its correlation structure is homogeneous. Therefore, some of the 
nonstationarity in the data is not accounted for in model B and we get larger 
estimated values for the nugget effect. Some of the aj's and ftj's have rather 
large asymptotic standard errors, which may be a sign that the data do not 
provide enough information to estimate these particular parameters well. 
From Figure 4, it is interesting to note that the profile loglikelihood has two 
local modes with respect to 02 for model H. 

Figure 1(b) shows the comparison of the empirical and fitted values for 
the standard deviations, K(L, L, 0) 1//2 , with respect to L. The covariance 
model A naturally gives constant standard deviations. The other models 
give similar fits except for model C, which creates a spurious pattern around 
the equator. This spurious pattern may be a sign that we need bigger m for 
model C since the model is without the asymmetric term and is thus less 
flexible. In Figure l(c)-(e), it is clear that the fit to the data improves 




ai b, 




-3Q -sto -10 a id Jo -10 -S o 5 m 



Fig. 4. Shape of profile loglikelihood for model H. Top: profile loglikelihood with respect 
to ot (left) and ot\ (right). Middle: profile loglikelihood with respect to (3 and u (black) and 
j3\ and v\ (red). Bottom: profile loglikelihood with respect to ai 's and hi 's for i = (only 
for bo ) and i — 1, ... ,4. 



significantly by adding the asymmetric term as in Ki- Model H captures 
the patterns near the equator in the data quite well in spite of the problem 
in convergence of the optimization mentioned in the previous paragraph. 
Model C gives similar patterns as model H, but it mostly overestimates 
variation throughout all latitude levels. 

Table 3 gives the directional variogram values for the two nearest points in 
various directions and the corresponding fitted values. In comparing values 
across latitudes, we have to be careful since the distance that corresponds 



NONSTATIONARY COVARIANCE MODELS FOR GLOBAL DATA 



15 



Table 3 

The square root of the directional variograms at fixed latitudes. The details on how the 
quantities are calculated are given in Appendix B 



Empirical 

A 

B 

C 

F 

H 

Empirical 

A 

B 

c 

F 
H 

Empirical 

A 

B 

c 

F 
H 

Empirical 

A 

B 

C 

F 

H 



0.5° S 



20.5° S 



Latitude SE 



4.03 
5.74 
3.97 
4.65 
3.67 
3.94 

3.55 
5.61 
3.99 
3.60 
3.84 
3.67 

5.08 
5.25 
4.79 
5.10 
4.93 
5.05 

5.99 
5.10 
5.05 
5.29 
5.26 
5.23 



SW W Latitude NW 



40.5° S 



46.5° S 



3.53 
4.30 
3.13 
3.57 
3.34 
3.71 

3.23 
4.30 
3.20 
2.91 
3.31 
3.05 

5.15 
4.30 
3.98 
4.21 
4.11 
4.31 

4.44 
4.30 
4.28 
4.47 
4.45 
4.35 



3.99 
5.74 
3.97 
4.65 
3.96 
4.33 

3.81 
5.61 
3.99 
3.60 
4.01 
3.77 

6.04 
5.25 
4.79 
5.10 
4.94 
5.10 

5.57 
5.10 
5.05 
5.29 
5.25 
5.08 



3.23 
4.95 
3.51 
4.05 
3.18 
3.44 

2.95 
4.75 
3.45 
3.15 
3.18 
3.02 

3.21 
4.16 
3.84 
4.04 
3.67 
3.75 

4.52 
3.90 
3.88 
4.07 
3.74 
3.71 



0.5° N 



20.5° N 



40.5° N 



46.5° N 



3.66 
5.74 
4.04 
4.68 
4.01 
4.39 

4.29 
5.61 
5.51 
4.73 
5.20 
4.77 

9.21 
5.25 
7.84 
8.28 
7.72 
8.06 

9.03 
5.10 
8.50 
8.74 
8.53 
8.87 



N 

3.18 
4.30 
3.18 
3.59 
3.39 
3.75 

3.38 
4.30 
4.25 
3.70 
4.35 
3.93 

7.07 
4.30 
6.33 
6.70 
6.51 
6.88 

7.25 
4.30 
7.05 
7.27 
7.32 
7.50 



NE 

3.79 
5.74 
4.04 
4.68 
3.72 
3.98 

3.73 
5.61 
5.51 
4.73 
5.17 
4.85 

7.86 
5.25 
7.84 
8.28 
8.00 
8.22 

8.89 
5.10 
8.50 
8.74 
8.85 
8.77 



E 

3.12 
4.95 
3.54 
4.07 
3.20 
3.46 

2.98 
4.75 
4.63 
4.01 
4.12 
3.84 

6.48 
4.16 
6.06 
6.39 
5.82 
5.92 

6.48 
3.90 
6.29 
6.53 
6.17 
6.16 



to a degree longitude changes with latitudes. The distance between neigh- 
boring two cells in N-S direction is always the same distance apart across 
latitudes. Most of the models give good fits to the empirical values and 
model H overall gives the best fit. Notice that model A gives a symmetric 
fit for both Northern and Southern Hemisphere due to its limitation. The 
empirical values in the table (compare NW and NE at 40° N or SE and SW 
at 40° S) suggest that the process is not longitudinally reversible, that is, 
K{L\,L2,l) 7^ K(Li,L2, —I) for some Li,L2 and I, which coincides with the 
finding of Stein (2007) for TOMS Level 2 data. Our fitted models do not 
capture this moderate local anisotropy, despite the fact that some of the 
models allow for longitudinal irreversibility. 

The shapes of the fitted functions A and B as well as P(L;ko, . . . ,k m ) 
[in (4)] for each model are given in Figure 5. The shape of P(L; ko, . . . , k m ) 
resembles the shape of the empirical variances in Figure 1(b). For most 
latitude values, the magnitude of B is close to zero compared to that of 




Fig. 5. The estimated values of P(L; ko, . . . , k m ), A(L), B(L) and A(L)B(L) with respect 
to latitude for the covariance models in Table 1. The color legend is the same as in Figure 1. 



A. In fact, the fit with B set to be zero (model G) is not much worse 
than model H (see Table 1 for loglikelihood values). For model G, most of 
the other parameter values stay nearly the same as in model H except the 
smoothness parameter ui, which is close to the boundary of the parameter 
space (note V\ > 1). This may be a sign of model misfit. 

5. Discussion. In the paper we rather informally compare different mod- 
els using several criterion such as maximized likelihood values or the compar- 
ison between empirical data and fitted values. We would need more formal 
criteria such as AIC or BIC that penalizes the number of parameters if we 
want to perform formal model selection. 

We arbitrarily choose the values of m, n± and n-2 and do not estimate them. 
The point is that with moderate values of these constants, we achieve great 
flexibility of the covariance structure. As in Tables 1 and 2 and Figures 1 
and 5, the performance/behavior of the estimated covariances do not change 
much for models F through J. The models with m = 2 or 4 should not give 
much different covariance structure than m = 3. In choosing these constants, 
there is a trade off between the flexibility of the covariance structure and 
the simplicity of the covariance models (in terms of number of parameters). 
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We demonstrate that the characteristics of nonstationarity in our data 
are similar to those in Stein (2007) that deal with Level 2 data (ungridded 
version). However, it is possible that some of the nonstationarity come from 
the gridding of Level 3 data and this effect would likely be worse if we went 
beyond our latitude range here. 

A Bayesian approach may be a natural alternative for parameter esti- 
mation and imputation of missing data. The computational technique in 
Section 3 requires complete observations and we used a simple imputation 
technique to fill in missing values. For applications with a nonnegligible 
amount of missing data, a Bayesian framework, if computationally feasible, 
may be a better way of imputing the missing values as well as estimating 
the covariance parameters and their uncertainty. 

The nonstationary covariance models developed in this paper have a very 
natural extension to spatial-temporal processes. By introducing a differ- 
ential operator with respect to time in addition to those with respect to 
longitude and latitude, the models should not only capture spatial non- 
stationary patterns, but also create flexible space-time interaction, such as 
space-time asymmetry. Jun and Stein (2007) demonstrate the flexibility of 
this approach in capturing space-time asymmetry in total column ozone 
measurements in the Northern hemisphere, but they only considered a lim- 
ited latitude range and did not allow sufficient flexibility in their models to 
fit the nonstationarity in the spatial variations of total column ozone on a 
global scale. The idea of applying linear combinations of Legendre polynomi- 
als for the coefficients of the differential operators introduced in this paper 
can also be applied to obtain flexible space-time interactions, for example, 
allowing the speed of flow of ozone across longitudes to vary with latitude. 



This appendix describes how each quantity in Figure 1 is calculated. We 
have 100 equally spaced latitude points from —49.5° to 49.5° and we denote 
these Li, . . . ,Lioo- We also have 288 equally spaced longitude points from 
— 179.375° to 179.375° and we denote these Zi,...,/288- Let us also denote 



Z L {-) = ^T,li°iZ(L i ,-) and Z l {-) = ^ £?£! Z{-, k). We let AL = 1° and 
Al = 1.25°. Then: 

(a) Var{Z(-,Z)} = & Ei=i{Z(Li, I) - Z L (l)} 2 for each I. 

(b) Var{Z{L,-)} = ^j: 2 i=i{Z(L,k) - Z l (L)} 2 for each L. 



APPENDIX A 



(c) For each L, 
Yax{Z(L,l) 



-Z(L,l-Al)} 



288 T 288 



1 

286 



£ Z(L, k) - Z(L, - ^ lj) - Z(L, !,-_!)} 



i=2 L 3=2 



18 



M. JUN AND M. L. STEIN 



(d) For each L, 

Var{Z(L, I + Al) - 2Z(L, I) + Z(L, I - Al)} 



2SV 



285 



i=2 



Z(L,l i+ i)-2Z(L,l) + Z(L,li- 1 ) 

287 ] 2 

i=2 

(e) For each Lj, i = 2, . . . , 100, 

Var{Z(Li, Z + Al) - Z(L u l) - Z{L^ x ,l + Al) + ZiL^Vj) 



287 
286 X] 



Z(Li,lj + i) — Z(Li,lj) — Z(Li_i,lj + i) + Z(Li-i,lj) 

287 

fc=l 

- 2 

— Z(Li,lk) — Z(Li_i,lk+i) + Z(Li^i,lk)} 



APPENDIX B 

This appendix gives how quantities in Table 3 are calculated. Suppose the 
fixed latitude level is Li for some i = 2, . . . , 99. Then the empirical variogram 
for SE direction is given by J2]=i{Z(Li, lj) — Z(Li^i,lj + i)} 2 and the cor- 
responding fitted value is K(Li, Li,0) + A'(L,j„i, 0) — 2A(Lj, lj — 
lj+i) for K, an estimated covariance function of Z. Similarly, the empirical 
variogram for S direction is given by lj) — Z(Li_i,lj)} 2 and 

the corresponding fitted value is K(Li, Li, 0) + A(Lj_i, Lj_i, 0) — 2A(Lj, 0). 
The values for the other directions are calculated similarly. 
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